- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 226
[WIP] Adaptive activation functions #497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
        
          
                src/pinns_pde_solve.jl
              
                Outdated
          
        
      | param_estim::PE | ||
| additional_loss::AL | ||
| adaptive_loss::ADA | ||
| adaptive_activation_function::ADF | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it needs to be added here. I think it just needs to be a property of the chain and added to the chain's weights list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take a look at how the Flux functors are built: https://fluxml.ai/Flux.jl/stable/models/functors/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it just needs to be a property of the chain
Could you elaborate a bit on how that would be done?
I looked into how Flux functors are built and how they are used, would it be implemented like this? It isn't exactly clear to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One way you might implement this idea in Flux is by generating a Flux.Chain for the entire network that is a sequence of three building blocks, repeated:
Flux.Chain(
Dense(in, out, σ=identity; bias=true, init=nn_param_init),
AdaptiveActivation(n, a),
NonlinearActivation(nonlinearity),
)
The entire network would be those three layers repeated for how many hidden layers there are. Then you'd have to write a struct for AdaptiveActivation which would simply multiply the appropriate input by the n a current value in an elementwise fashion (very similar to the Diagonal implementation in Flux), and a NonlinearActivation struct which would have no trainable parameters but simply apply desired nonlinearity after the adaptive activation.  Basically in total you'd be recreating the Dense layer but with an extra elementwise operation in the middle.
This part of the Flux code would be good reference, since the building blocks you'd want to implement are similar to Dense and Diagonal, and you can see how the @functor macro gets used.
https://github.com/FluxML/Flux.jl/blob/ef04fda844ea05c45d8cc53448e3b25513a77617/src/layers/basic.jl#L82-L122
Interestingly the first paper's version is the hardest to implement here because you have to make sure that each of the AdaptiveActivations are utilizing the same value for a (this is known as weight tying or weight sharing).  That would go under the kind of stuff that is covered in this doc for Flux:
https://fluxml.ai/Flux.jl/stable/models/advanced/
| I agree with Chris that I think most of that work should not go here as it is mostly neural architecture related.  I think it could make sense to add to the DiffEqFlux repository as was mentioned in the issue for this method.  Alternatively we could make a new file in this repo such as  where you return a  The hyperparameters you would want to include (as arguments to that function) are at least: 
 and possibly others that I didn't think of that will become apparent to you during implementation | 
| Also I think there's an issue with your line ending commit style, and that's why almost every line has a change. Are you committing in Windows line-ending style? I believe we're using Unix line-ending style and having the two be different would result in almost every line being changed constantly (like what is being observed here). I think it's an option in your git config settings. | 
| Thanks for the pointers! I think I have a clearer idea of what to do now. I'll create a  Also yes, I had Windows line-ending style for now, I'll change to Unix line-ending style for further commits. | 
| Also, here's an example of using the  | 
c697e42    to
    7ae9f97      
    Compare
  
    | I have re written the skeleton in a new file  I wanted to ask: 
 | 
| layer = Flux.Chain( | ||
| Dense(in, out, σ=identity; bias=true, init=nn_param_init), | ||
| AdaptiveActivation(n, a), | ||
| NonlinearActivation(nonlinearity), | ||
| ) # to be stacked for as many hidden layers specified (N) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this actually needed, or is the AdaptiveActivation enough? I think those 8 lines are all that's really needed right? And that could just be added to Flux's activation function list?
| I've been really busy with a project deadline on Tuesday, I should be able to do a thorough review and guide after that. | 
Work-in-progress for #355